Vcauxbrisebo/vm gpu support by vince-brisebois · Pull Request #851 · NVIDIA/OpenShell

vince-brisebois · 2026-04-15T18:07:54Z

Summary

Adds VFIO GPU passthrough support to openshell-vm using cloud-hypervisor as a second VMM backend alongside libkrun. Includes a full GPU bind/unbind lifecycle with safety checks, nvidia driver deadlock hardening (subprocess isolation with timeout, pre-unbind module cleanup, post-timeout verification), and an RAII guard that restores the original driver on exit.

Related Issue

N/A

Changes

VMM backend abstraction: Extract VmBackend trait with LibkrunBackend and CloudHypervisorBackend implementations; auto-select CHV when --gpu is set, reject --backend libkrun --gpu at the CLI level
GPU bind lifecycle (gpu_passthrough.rs): Probe sysfs for NVIDIA GPUs, check VFIO/IOMMU readiness, fail-closed safety checks (display outputs, /dev/nvidia* handles, IOMMU groups, VFIO modules, permissions), RAII GpuBindGuard for driver restoration
nvidia unbind deadlock hardening: Pre-unbind prep (disable persistence mode, unload nvidia_uvm/nvidia_drm/nvidia_modeset), all sysfs writes and prep commands in subprocesses with timeout (10s/15s), drop(child) without wait() to prevent parent D-state, post-timeout verification that continues if device is actually unbound
Cloud-hypervisor backend: Direct kernel boot with virtiofsd, TAP networking with NAT/port forwarding, vsock exec bridge, ACPI shutdown wrapper for --exec mode
Kernel kconfig: Add CONFIG_VIRTIO_PCI, CONFIG_SERIAL_8250, CONFIG_SERIAL_8250_CONSOLE, CONFIG_ACPI, CONFIG_PCI, CONFIG_PCI_MSI, CONFIG_DRM, CONFIG_MODULES, CONFIG_MODULE_UNLOAD
Guest rootfs: NVIDIA driver install support, device plugin and runtime class manifests, init script GPU detection and module loading
CI: gpu-ci.yml workflow on self-hosted GPU runners with OPENSHELL_VM_GPU_E2E=1
Architecture docs: Update custom-vm-runtime.md for dual-backend architecture, add vm-gpu-passthrough.md, add both to architecture/README.md index
Pre-commit fixes: rustfmt corrections, clippy ptr_arg fix in build.rs, test race condition fix in image.rs

Testing

mise run pre-commit passes
Unit tests added/updated
E2E tests added/updated (if applicable)

Checklist

Follows Conventional Commits
Commits are signed off (DCO)
Architecture docs updated (if applicable)

copy-pr-bot · 2026-04-15T18:07:59Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: Vincent Caux-Brisebois <vcauxbrisebo@nvidia.com>

drew

this looks good. couple questions around deps and some approaches to bundling.

once those are resolved we can land this pr, or try to apply it on top of #858 which starts the new driver implementation for the vm.

drew · 2026-04-17T04:19:18Z

+  vm:
+    name: VM Checks
+    runs-on: build-amd64


maybe for now we can leave these out. we'll add tests once we get things over to the driver architecture.

drew · 2026-04-17T04:21:44Z

same as above. i think we stash this. once we land the initial implementation lets add the vm to our e2e tests.

drew · 2026-04-17T04:53:07Z

nit: we dont want to check this in. i capture the plan for posterity under OS-53.

drew · 2026-04-17T04:56:09Z

-#   ./build-rootfs.sh [--base] [--arch aarch64|x86_64] [output_dir]
+#   ./build-rootfs.sh [--base] [--gpu] [--arch aarch64|x86_64] [output_dir]


Will we need to publish two rootfs'?

Yes — the plan is two rootfs variants: a standard CPU rootfs (the current default) and a GPU rootfs built with --gpu. The GPU variant bakes in the NVIDIA driver packages (~550MB+) and nvidia-container-toolkit, which are required for the guest init to succeed with GPU passthrough. Keeping them separate avoids bloating the default image for non-GPU users. The launcher auto-selects the GPU rootfs when --gpu is passed (fixing that selection logic is part of the Codex P1 feedback in this PR). Long-term we could explore a layered approach, but two distinct builds is the simplest path for now.

posted via cursor for vince-brisebois

drew · 2026-04-17T04:57:14Z

plan here is to ship a single kernel for both gpu and cpu? (i think this makes sense for now, just checking)

Yep, single kernel for both. The GPU-related kconfig additions (CONFIG_PCI, CONFIG_PCI_MSI, CONFIG_DRM, CONFIG_MODULES, CONFIG_MODULE_UNLOAD) are harmless on non-GPU boots — the kernel just probes and finds no PCI devices. The size overhead is negligible since these are small built-in drivers, not loadable modules. Avoids the complexity of maintaining and distributing two separate kernel builds.

posted via cursor for vince-brisebois

drew · 2026-04-17T04:58:05Z

is this change necessary?

Not GPU-related — this was a test isolation fix for OPENSHELL_COMMUNITY_REGISTRY that got bundled into the feature commit by accident. The bare_name_expands_to_community_registry test could flake if the env var is set externally.

Reverted from this branch. Will submit separately as a test(core): fix env isolation in image resolution tests PR so it doesn't muddy this diff.

posted via cursor for vince-brisebois

drew · 2026-04-17T04:58:21Z

we want to avoid adding the whole vm dep here if possible. lets talk about this more since i don't know if it's straightforward to remove or not.

drew · 2026-04-17T05:05:16Z

Also ran this through codex review, its feedback looks pretty good:

The patch introduces two functional blockers in the new GPU path: the gateway CLI drops the VFIO bind guard as soon as deployment returns, and openshell-vm --gpu still boots the non-GPU rootfs by default. It also leaves host IPv4
forwarding enabled after cloud-hypervisor VMs stop.

Full review comments:

[P1] Keep the VFIO bind alive for the gateway VM lifetime — /Users/anewberry/dev/openshell/crates/openshell-cli/src/run.rs:1438-1438
On the local microVM gateway path (OPENSHELL_GATEWAY_BACKEND=vm), prepare_gateway_deploy_gpu returns a GpuBindGuard that is scoped only to gateway_admin_deploy. This function returns as soon as the gateway becomes healthy, so the guard
drops immediately and rebinds the passed-through GPU back to the host driver while the gateway VM is still running. In practice, openshell gateway start --gpu tears down its own VFIO assignment right after startup instead of keeping
the device attached for the VM lifetime.
[P1] Select a GPU-capable rootfs when --gpu is requested — /Users/anewberry/dev/openshell/crates/openshell-vm/src/main.rs:236-238
When --gpu is used without an explicit --rootfs, this still resolves the normal named/embedded rootfs. The new guest init path only succeeds on a rootfs built with build-rootfs.sh --gpu (it now requires NVIDIA userspace tools and /opt/
openshell/gpu-manifests), so the default packaged openshell-vm --gpu flow boots the wrong image and exits during init. Today the GPU path only works if the caller manually points --rootfs at a separately built GPU image.
[P2] Restore host IPv4 forwarding after CHV teardown — /Users/anewberry/dev/openshell/crates/openshell-vm/src/backend/cloud_hypervisor.rs:882-883
Any Linux cloud-hypervisor launch with TAP networking (--backend cloud-hypervisor or the automatic --gpu path) now writes /proc/sys/net/ipv4/ip_forward=1, but teardown_chv_host_networking only removes the iptables rules. After the VM
exits, the host remains in forwarding mode until the user changes it back or reboots, which is a persistent host-networking side effect introduced by this command.

…a unbind hardening Signed-off-by: Vincent Caux-Brisebois <vcauxbrisebo@nvidia.com>

GPU support design and implementation plan

5fb51ac

Signed-off-by: Vincent Caux-Brisebois <vcauxbrisebo@nvidia.com>

vince-brisebois force-pushed the vcauxbrisebo/vm-gpu-support branch from 6ab54d1 to 199a712 Compare April 15, 2026 18:25

drew reviewed Apr 17, 2026

View reviewed changes

feat(vm): add GPU passthrough with cloud-hypervisor backend and nvidi…

a9491fd

…a unbind hardening Signed-off-by: Vincent Caux-Brisebois <vcauxbrisebo@nvidia.com>

vince-brisebois force-pushed the vcauxbrisebo/vm-gpu-support branch from 199a712 to a9491fd Compare April 17, 2026 20:11

		# ./build-rootfs.sh [--base] [--arch aarch64\|x86_64] [output_dir]
		# ./build-rootfs.sh [--base] [--gpu] [--arch aarch64\|x86_64] [output_dir]

Conversation

vince-brisebois commented Apr 15, 2026

Summary

Related Issue

Changes

Testing

Checklist

Uh oh!

copy-pr-bot bot commented Apr 15, 2026

Uh oh!

drew left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

drew commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

drew commented Apr 17, 2026 •

edited

Loading